Skip to content

Fix chat response truncation and raw JSON display#635

Merged
Chris0Jeky merged 6 commits intomainfrom
feature/616-chat-json-truncation
Mar 31, 2026
Merged

Fix chat response truncation and raw JSON display#635
Chris0Jeky merged 6 commits intomainfrom
feature/616-chat-json-truncation

Conversation

@Chris0Jeky
Copy link
Copy Markdown
Owner

Summary

  • Increase default MaxTokens from 1024 to 2048 to reduce truncation frequency
  • Add finish_reason extraction to OpenAI provider; mark as degraded when finish_reason: "length"
  • Add finishReason extraction to Gemini provider; mark as degraded when finishReason: "MAX_TOKENS"
  • Both providers now detect partial JSON (starts with { but fails parse) when JSON mode was requested, and mark as degraded with reason "Response was truncated"
  • Frontend AutomationChatView.vue detects assistant messages that look like truncated JSON and shows "This response was cut short. Try a simpler question or rephrase." instead of raw broken JSON

Closes #616

Test plan

  • All 1929 backend tests pass (0 failures)
  • Frontend typecheck passes
  • Verify with a live OpenAI/Gemini provider that truncated responses show the degraded badge
  • Verify the frontend truncation notice renders for messages starting with { that aren't valid JSON
  • Verify valid JSON responses still render normally through the markdown pipeline

The previous 1024 limit was too low, causing LLM responses to be
truncated mid-JSON when structured output was requested. Fixes #616.
Extract finish_reason from the OpenAI response. When it is "length",
mark the result as degraded with reason "Response was truncated".
Also detect invalid JSON when JSON mode was requested, which indicates
the response was cut off before the model could produce valid output.
Extract finishReason from the Gemini response candidates. When it is
"MAX_TOKENS", mark the result as degraded with reason "Response was
truncated". Also detect invalid JSON when JSON mode was requested.
When an assistant message starts with '{' but is not valid JSON, the
frontend now shows "This response was cut short. Try a simpler question
or rephrase." instead of rendering the raw broken JSON.
Plain-text LLM responses should still fall through to the static
classifier. Only flag content that starts with '{' but fails JSON
parsing, which indicates truncated structured output.
@Chris0Jeky
Copy link
Copy Markdown
Owner Author

Adversarial Self-Review

What looks correct

  • MaxTokens increase from 1024 to 2048 is straightforward and safe (callers can still override).
  • finish_reason extraction in OpenAI uses "length" (correct per OpenAI docs); Gemini uses "MAX_TOKENS" (correct per Gemini docs). Both use OrdinalIgnoreCase.
  • LooksLikeTruncatedJson correctly requires the string to start with { before flagging, so plain-text LLM responses pass through to the existing static classifier path.
  • Frontend isTruncatedJson mirrors the backend logic and short-circuits for non-{ content.
  • All 1929 existing backend tests pass. Frontend typecheck is clean.

Potential issues found

  1. Duplicated LooksLikeTruncatedJson method: The same helper is defined identically in both OpenAiLlmProvider and GeminiLlmProvider. Could be extracted to a shared static utility class. Leaving as-is since both are internal static and the duplication is small (10 lines each), but worth noting for future cleanup.

  2. Frontend truncation check only runs for assistant/system messages: This is correct — user messages should never hit this path. The v-if chain ordering is isTruncatedJson first, then renderMarkdown second, which prevents raw JSON from being rendered through the markdown pipeline.

  3. Edge case — valid JSON that starts with { in assistant messages: If an assistant legitimately returns a JSON string like {"reply":"hello"}, the frontend isTruncatedJson returns false (JSON parses successfully), so it renders through markdown as normal. This is correct behavior.

  4. Edge case — finish_reason: null from OpenAI: The string.Equals call with a null finishReason is safe — string.Equals(null, "length", ...) returns false. No null-ref risk.

  5. No new backend tests for the truncation detection paths: The existing tests cover the happy paths and validate that plain-text responses still work. However, there are no tests specifically for finish_reason: "length" producing a degraded result, or for LooksLikeTruncatedJson returning true. These would strengthen confidence but the logic is straightforward enough that manual testing covers the risk.

Acceptance criteria check

  • MaxTokens = 2048
  • Backend JSON truncation detection (partial JSON + finish_reason)
  • Frontend truncated JSON display with friendly message
  • finish_reason checking for OpenAI ("length") and Gemini ("MAX_TOKENS")

No issues requiring immediate fixes found.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements truncation detection for LLM responses in the Gemini and OpenAI providers by monitoring provider-specific finish reasons and validating JSON integrity. It also increases the default maximum token count and introduces a frontend notice for truncated messages. The review feedback identifies that the JSON truncation logic is currently fragile and duplicated across the backend and frontend; it is recommended to centralize this logic into a shared utility and enhance it to handle conversational prefixes or markdown formatting by using more robust brace-matching.

Comment on lines +369 to +387
internal static bool LooksLikeTruncatedJson(string text)
{
if (string.IsNullOrWhiteSpace(text))
return false;

var trimmed = text.TrimStart();
if (!trimmed.StartsWith('{'))
return false;

try
{
using var doc = JsonDocument.Parse(trimmed);
return false;
}
catch (JsonException)
{
return true;
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of LooksLikeTruncatedJson is fragile because it assumes the response starts exactly with a {. LLMs occasionally include conversational prefixes or markdown code blocks (e.g., "Here is the JSON: ```json ...") even when instructed to return JSON.

Additionally, this logic is duplicated in OpenAiLlmProvider.cs. Consider moving this utility to a shared location like LlmInstructionExtractionPrompt and making it more robust by searching for the first and last braces, similar to the logic used in TryParseStructuredResponse.

    internal static bool LooksLikeTruncatedJson(string text)
    {
        if (string.IsNullOrWhiteSpace(text)) return false;
        var trimmed = text.Trim();
        var first = trimmed.IndexOf('{');
        var last = trimmed.LastIndexOf('}');

        if (first >= 0 && last > first)
        {
            try
            {
                using var doc = JsonDocument.Parse(trimmed[first..(last + 1)]);
                return false;
            }
            catch (JsonException) { }
        }

        return first >= 0;
    }

Comment on lines +353 to +371
internal static bool LooksLikeTruncatedJson(string text)
{
if (string.IsNullOrWhiteSpace(text))
return false;

var trimmed = text.TrimStart();
if (!trimmed.StartsWith('{'))
return false;

try
{
using var doc = JsonDocument.Parse(trimmed);
return false;
}
catch (JsonException)
{
return true;
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This method is identical to the one in GeminiLlmProvider.cs. To improve maintainability and robustness, consider moving this logic to a shared utility class (e.g., LlmInstructionExtractionPrompt) and updating it to handle cases where the JSON is wrapped in markdown or preceded by text.

    internal static bool LooksLikeTruncatedJson(string text)
    {
        if (string.IsNullOrWhiteSpace(text)) return false;
        var trimmed = text.Trim();
        var first = trimmed.IndexOf('{');
        var last = trimmed.LastIndexOf('}');

        if (first >= 0 && last > first)
        {
            try
            {
                using var doc = JsonDocument.Parse(trimmed[first..(last + 1)]);
                return false;
            }
            catch (JsonException) { }
        }

        return first >= 0;
    }

Comment on lines +31 to +41
function isTruncatedJson(content: string): boolean {
if (!content) return false
const trimmed = content.trim()
if (!trimmed.startsWith('{')) return false
try {
JSON.parse(trimmed)
return false
} catch {
return true
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The isTruncatedJson heuristic on the frontend shares the same robustness issue as the backend: it fails to detect truncation if the assistant includes any leading text or markdown formatting. Improving the brace-matching logic here will ensure the truncation notice is shown even when the response isn't a pure JSON string.

function isTruncatedJson(content: string): boolean {
  const trimmed = (content || '').trim()
  const first = trimmed.indexOf('{')
  const last = trimmed.lastIndexOf('}')

  if (first >= 0 && last > first) {
    try {
      JSON.parse(trimmed.substring(first, last + 1))
      return false
    } catch {}
  }
  return first >= 0
}

Cover OpenAI finish_reason=length, Gemini finishReason=MAX_TOKENS,
OpenAI JSON-mode truncation, and LooksLikeTruncatedJson unit tests
for both providers.
@Chris0Jeky
Copy link
Copy Markdown
Owner Author

Self-review follow-up: tests added

Addressed finding #5 from the self-review (no tests for truncation paths). Added 13 new tests:

  • OpenAiLlmProviderTests.CompleteAsync_ShouldReturnDegraded_WhenFinishReasonIsLength -- verifies finish_reason "length" produces degraded result
  • OpenAiLlmProviderTests.CompleteAsync_ShouldReturnDegraded_WhenJsonModeResponseIsInvalidJson -- verifies truncated JSON in JSON mode produces degraded result
  • OpenAiLlmProviderTests.LooksLikeTruncatedJson_ShouldDetectPartialJson (5 cases) -- unit tests for the helper
  • GeminiLlmProviderTests.CompleteAsync_ShouldReturnDegraded_WhenFinishReasonIsMaxTokens -- verifies Gemini MAX_TOKENS produces degraded result
  • GeminiLlmProviderTests.LooksLikeTruncatedJson_ShouldDetectPartialJson (5 cases) -- unit tests for the helper

All 1942 backend tests pass. Frontend typecheck clean.

@Chris0Jeky
Copy link
Copy Markdown
Owner Author

Adversarial Code Review -- PR #635

Reviewed the full diff and surrounding file context. Findings below, ordered by severity.


1. [HIGH] Frontend isTruncatedJson hides valid truncated responses instead of using backend IsDegraded

The backend already marks truncated responses with IsDegraded: true and DegradedReason: "Response was truncated", and the existing template already renders a degraded warning banner when message.messageType === 'degraded' (line 611). The frontend isTruncatedJson check is a redundant, weaker re-implementation that:

  • Hides the actual (partial) content entirely, replacing it with a generic notice. The backend path shows the content plus the degraded banner -- much more useful for debugging.
  • Can false-positive on legitimate JSON responses from non-JSON-mode paths. If an assistant message happens to start with { and has a typo (e.g., trailing comma), the frontend will swallow it.
  • Does not trigger for array-shaped truncated JSON ([{"item":...), though this is a minor gap since the backend uses JSON object mode.

Suggestion: Remove the frontend isTruncatedJson guard entirely. The backend truncation detection (finish_reason + LooksLikeTruncatedJson) already sets IsDegraded, and the existing degraded message UI handles it. If there is a scenario where truncated JSON reaches the frontend without the degraded flag (e.g., stored historical messages), fix it at the API/persistence layer, not with client-side heuristics.

If the frontend guard must stay as defense-in-depth, it should at minimum show the raw content below the notice (wrapped in a <pre> or code block) so the user can still see what was returned.


2. [MEDIUM] LooksLikeTruncatedJson is copy-pasted across two providers

Exact duplicate in OpenAiLlmProvider and GeminiLlmProvider (and tested independently in both test files). This violates DRY and means a future fix (e.g., for finding #3 below) must be applied in two places.

Suggestion: Extract to a shared static utility:

// In Taskdeck.Application/Services/LlmResponseValidator.cs (or similar)
internal static class LlmResponseValidator
{
    public static bool LooksLikeTruncatedJson(string text) { /* ... */ }
}

Both providers call LlmResponseValidator.LooksLikeTruncatedJson(content). Tests move to a single LlmResponseValidatorTests class.


3. [MEDIUM] LooksLikeTruncatedJson only checks leading { -- misses markdown-wrapped and prefixed JSON

LLMs frequently return responses like:

Here is the JSON:
{"reply":"hello","actionable":true

or markdown code fences:

```json
{"reply":"hello"

The current implementation returns false for both because the trimmed text does not start with {. Since JSON mode is explicitly requested (OpenAI response_format: json_object, Gemini responseMimeType: application/json), these prefixed forms should be rare -- but the entire point of this PR is handling misbehaving LLM output, so the edge case matters.

Suggestion: If LooksLikeTruncatedJson stays narrow (only {-prefixed), rename it to make the contract explicit: StartsWithBraceButInvalidJson. Alternatively, add a lightweight extraction step:

internal static bool LooksLikeTruncatedJson(string text)
{
    if (string.IsNullOrWhiteSpace(text)) return false;
    var trimmed = text.TrimStart();

    // Strip markdown code fence if present
    if (trimmed.StartsWith("```"))
    {
        var newlineIdx = trimmed.IndexOf('\n');
        if (newlineIdx >= 0)
            trimmed = trimmed[(newlineIdx + 1)..].TrimStart();
    }

    if (!trimmed.StartsWith('{') && !trimmed.StartsWith('['))
        return false;

    try { using var doc = JsonDocument.Parse(trimmed); return false; }
    catch (JsonException) { return true; }
}

4. [MEDIUM] MaxTokens default change from 1024 to 2048 is a silent API contract change

ChatCompletionRequest.MaxTokens has a default of 1024 in the current codebase. Changing it to 2048 doubles the token budget for every caller that does not explicitly pass MaxTokens -- including probe requests (MaxTokens: 4) which are fine, but also any future caller that relies on the default. This doubles cost per request for OpenAI/Gemini.

This isn't necessarily wrong, but it deserves a note in docs/STATUS.md or a changelog entry, and ideally an explicit decision rather than a side-effect of a truncation-fix PR. The probe request already overrides to 4, so it's not broken -- but the cost implication should be acknowledged.

Suggestion: Either add a comment in the PR description acknowledging the cost doubling, or make this a separate, explicit commit/PR.


5. [MEDIUM] OpenAI: useInstructionExtraction is computed differently in CompleteAsync vs BuildRequestPayload

In the PR diff for OpenAI, the truncation check introduces:

var useInstructionExtraction = request.SystemPrompt is null;
if (useInstructionExtraction && LooksLikeTruncatedJson(content))

This duplicates the same expression from BuildRequestPayload (line 195). If the logic for "when is JSON mode active" ever changes (e.g., a new flag), it must be updated in two places within the same file. In Gemini, useInstructionExtraction is already a local in CompleteAsync, so it's fine there.

Suggestion: In OpenAI's CompleteAsync, compute useInstructionExtraction once at the top (before the HTTP call) and reuse it, rather than recomputing after the response arrives.


6. [LOW] Test coverage gap: no test for finish_reason: "stop" + valid JSON (happy path with new parsing)

The tests cover finish_reason: "length" (OpenAI) and finishReason: "MAX_TOKENS" (Gemini), and LooksLikeTruncatedJson unit tests. But there's no test verifying that finish_reason: "stop" with valid JSON content still produces a non-degraded result through the new code paths. This is the regression-safety test -- proving the new checks don't accidentally flag normal responses.


7. [LOW] Frontend: isTruncatedJson swallows JSON.parse errors silently

The catch {} block in isTruncatedJson catches all errors, not just syntax errors. In practice JSON.parse only throws SyntaxError, so this is fine, but an explicit catch (e) { return e instanceof SyntaxError } would be more precise.


8. [LOW] Inconsistent truncation result construction

Both providers construct nearly identical LlmCompletionResult objects for truncation (two copies in each provider -- one for finish_reason, one for JSON validation). This is 4 near-identical blocks. Consider a private helper:

private static LlmCompletionResult BuildTruncatedResult(
    string content, int tokensUsed, string provider, string model)
{
    return new LlmCompletionResult(
        content, tokensUsed,
        IsActionable: false,
        Provider: provider,
        Model: model,
        IsDegraded: true,
        DegradedReason: "Response was truncated");
}

Verdict

No critical/blocking issues. The core truncation detection logic (finish_reason checking) is correct and well-tested. The main concerns are:

I would recommend addressing #1 and #2 before merging; the rest can be follow-up items.

@Chris0Jeky Chris0Jeky merged commit 34d632a into main Mar 31, 2026
18 checks passed
@Chris0Jeky Chris0Jeky deleted the feature/616-chat-json-truncation branch March 31, 2026 14:52
@github-project-automation github-project-automation bot moved this from Pending to Done in Taskdeck Execution Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

LLM-01: Chat — fix response truncation and raw JSON display

1 participant